Approximation of the truncated Zeta distribution and Zipf's law
نویسنده
چکیده
Zipf’s law appears in many application areas but does not have a closed form expression, which may make its use cumbersome. Since it coincides with the truncated version of the Zeta distribution, in this paper we propose three approximate closed form expressions for the truncated Zeta distribution, which may be employed for Zipf’s law as well. The three approximations are based on the replacement of the sum occurring in Zipf’s law with an integral, and are named respectively the integral approximation, the average integral approximation, and the trapezoidal approximation. While the first one is shown to be of little use, the trapezoidal approximation exhibits an error which is typically lower than 1%, but is as low as 0.1% for the range of values of the Zipf parameter below 1.
منابع مشابه
A Stochastic Process for Word Frequency Distributions
A stochastic model based on insights of Mandelbrot (1953) and Simon (1955) is discussed against the background of new criteria of adequacy that have become available recently as a result of studies of the similarity relations between words as found in large computerized text corpora. FREQUENCY DISTRIBUTIONS Various models for word frequency distributions have been developed since Zipf (1935) ap...
متن کاملAsymptotic Behaviors of the Lorenz Curve for Left Truncated and Dependent Data
The purpose of this paper is to provide some asymptotic results for nonparametric estimator of the Lorenz curve and Lorenz process for the case in which data are assumed to be strong mixing subject to random left truncation. First, we show that nonparametric estimator of the Lorenz curve is uniformly strongly consistent for the associated Lorenz curve. Also, a strong Gaussian approximation for ...
متن کاملPower laws and the golden number
The distribution of many real discrete random variables (e.g., the frequency of words, the population of cities) can be approximated by a zeta distribution, that is known popularly as Zipf’s law, or power law in physics. Here we revisit the relationship between power law distribution of a magnitude and the corresponding power relationship between the magnitude of a certain element and its rank....
متن کاملProbabilistic Reuse of Past Search Results
In this paper, a new Monte Carlo algorithm to improve precision of information retrieval by using past search results is presented. Experiments were carried out to compare the proposed algorithm with traditional retrieval on a simulated dataset. In this dataset, documents, queries, and judgments of users were simulated. Exponential and Zipf distributions were used to build document collections....
متن کاملA parallel space saving algorithm for frequent items and the Hurwitz zeta distribution
We present a message-passing based parallel version of the Space Saving algorithm designed to solve the k–majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many other different contexts. We apply our algorithm to the detection of frequent items in both real and syn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.01480 شماره
صفحات -
تاریخ انتشار 2015